set.seed(1)
x <- 1:10
y <- 2 + 3 * x + rnorm(10, 0, 2)
X <- cbind(1, x)
beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y
model <- lm(y ~ x)
cbind(beta_hat, coef(model)) [,1] [,2]
1.662353 1.662353
x 3.109464 3.109464
Recall our objective: \[\text{SSE} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\]
Goal: Find \(\hat{\boldsymbol{\beta}}\) that minimizes SSE
Method: Take the derivative with respect to \(\boldsymbol{\beta}\) and set equal to zero
Start with: \[\text{SSE} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\]
Expand: \[\text{SSE} = \mathbf{y}^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\boldsymbol{\beta} - \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]
Since \(\mathbf{y}^T\mathbf{X}\boldsymbol{\beta} = \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y}\) (both are scalars): \[\text{SSE} = \mathbf{y}^T\mathbf{y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]
Matrix calculus rules we need:
Taking the derivative: \[\frac{\partial \text{SSE}}{\partial \boldsymbol{\beta}} = -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]
Set the derivative equal to zero: \[-2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\]
Divide by 2: \[-\mathbf{X}^T\mathbf{y} + \mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\]
Rearrange: \[\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{X}^T\mathbf{y}\]
Solve for \(\hat{\boldsymbol{\beta}}\): \[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]
Any matrix \(\mathbf{X}\) can be decomposed as: \[\mathbf{X} = \mathbf{Q}\mathbf{R}\]
Where:
Any matrix \(\mathbf{X}\) can be decomposed as: \[\mathbf{X} = \mathbf{Q}\mathbf{R}\]
In words: Break down \(\mathbf{X}\) into perpendicular directions (\(\mathbf{Q}\)) and scaling/rotation (\(\mathbf{R}\))
Traditional approach requires: \[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]
Problems with \((\mathbf{X}^T\mathbf{X})^{-1}\):
What do I mean by “unstable”
Error in solve.default(crossprod(X)) : system is computationally singular: reciprocal condition number = 3.11914e-17
If \(\mathbf{X} = \mathbf{Q}\mathbf{R}\), then: \[\mathbf{X}^T\mathbf{X} = (\mathbf{Q}\mathbf{R})^T(\mathbf{Q}\mathbf{R}) = \mathbf{R}^T\mathbf{Q}^T\mathbf{Q}\mathbf{R} = \mathbf{R}^T\mathbf{R}\]
So: \[ \begin{align} \hat{\boldsymbol{\beta}} &= (\mathbf{R}^T\mathbf{R})^{-1}\mathbf{R}^T\mathbf{Q}^T\mathbf{y} \\&= \mathbf{R}^{-1}(\mathbf{R}^T)^{-1}\mathbf{R}^T\mathbf{Q}^T\mathbf{y}\\& = \mathbf{R}^{-1}\mathbf{Q}^T\mathbf{y} \end{align} \]
Given \(\mathbf{X} = \mathbf{Q}\mathbf{R}\), verify that:
Hint: Use the fact that \(\mathbf{Q}^T\mathbf{Q} = \mathbf{I}\)
05:00
Given: \[\mathbf{R} = \begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix}, \quad \mathbf{Q}^T\mathbf{y} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]
Solve: \(\mathbf{R}\hat{\boldsymbol{\beta}} = \mathbf{Q}^T\mathbf{y}\)
\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]
Step 1: Solve bottom row first \[5\hat\beta_3 = 15 \implies \hat\beta_3 = 3\]
\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]
Step 2: Substitute into second row \[4\beta_2 + 2(3) = 12 \implies 4\beta_2 = 6 \implies \beta_2 = 1.5\]
\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]
Step 3: Substitute into first row \[2\beta_1 + 1(1.5) + 3(3) = 8 \implies 2\beta_1 = -2.5 \implies \beta_1 = -1.25\]
\[\hat{\boldsymbol{\beta}} = \begin{bmatrix} -1.25 \\ 1.5 \\ 3 \end{bmatrix}\]
Key insight: Work backwards through the triangular matrix, using previously solved values to eliminate unknowns in each step.
Condition number comparison:
Given \(\mathbf{H} = \mathbf{Q}\mathbf{Q}^T\) calculate \(\hat{y}\) using the QR elements.
05:00
Mean Squared Error (MSE): \[\text{MSE} = \frac{\text{SSE}}{n-p} = \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n-p}\]
Simple Linear Regression: \(y = \beta_0 + \beta_1 x + \varepsilon\)
Multiple Regression: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \varepsilon\)
Given the regression: \(\text{Salary} = 40000 + 2000 \times \text{YearsExperience} + 5000 \times \text{HasDegree}\)
Where HasDegree = 1 if person has degree, 0 otherwise
Interpret each coefficient:
04:00
teengamb dataset from the faraway package,lm function